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Abstract — In this paper, we address the problem of content 
placement in peer-to-peer systems, with the objective of maxi- 
mizing the utilization of peers' uplinli bandwidth resources. We 
consider system performance under a many-user asymptotic. We 
distinguish two scenarios, namely "Distributed Server Networks" 
(DSN) for which requests are exogenous to the system, and "Pure 
P2P Networks" (PP2PN) for which requests emanate from the 
peers themselves. For both scenarios, we consider a loss network 
model of performance, and determine asymptotically optimal 
content placement strategies in the case of a limited content 
catalogue. We then turn to an alternative "large catalogue" 
scaling where the catalogue size scales with the peer population. 
Under this scaling, we establish that storage space per peer 
must necessarily grow unboundedly if bandwidth utilization is 
to be maximized. Relating the system performance to properties 
of a specific random graph model, we then identify a content 
placement strategy and a request acceptance policy which jointly 
maximize bandwidth utilization, provided storage space per peer 
grows unboundedly, although arbitrarily slowly, with system size. 



I. Introduction 

The amount of multimedia traffic accessed via the Internet, 
already of the order of exabytes (10^^) per month, is expected 
to grow steadily in the coming years. A peer-to-peer (P2P) 
architecture, whereby peers contribute resources to support 
service of such traffic, holds the promise to support such 
growth more cheaply than by scaling up the size of data 
centers. More precisely, a large-scale P2P system based on 
resources of individual users can absorb part of the load that 
would otherwise need to be served by data centers. 

In the present work we address specifically the Video-on- 
Demand (VoD) application, for which the critical resources 
at the peers are storage space and uplink bandwidth. Our 
objective is to ensure that the largest fraction of traffic is 
supported by the P2P system. More precisely, we look for 
content placement strategies that enable content downloaders 
to maximally use the peers' uplink bandwidth, and hence max- 
imally offload the servers in the data centers. Such strategies 
must adjust to the distinct popularity of video contents, as a 
more popular content should be replicated more frequently. 

We consider the following mode of operation: Video re- 
quests are first submitted to the P2P system; if they are 

' Part of the results developed in this paper have made the object of a "brief 
announcement" in 1121 and further shown in more detail in 1131 . 




(a) Distributed Server Networl< (b) Pure Peer-to-Peer NetworJ< 

Fig. 1 : Two architectures of P2P VoD systems 



accepted, uplink bandwidth is used to serve them at the 
video streaming rate (potentially via parallel substreams from 
different peers). They are rejected if their acceptance would 
require disruption of an ongoing request service. Rejected 
requests are then handled by the data center. Alternative modes 
of operation could be envisioned (e.g., enqueueing of requests, 
service at rates distinct from the streaming rate, joint service 
by peers and data center,...). However the proposed model is 
appealing for the following reasons. It ensures zero waiting 
time for requests, which is desirable for VoD application; 
analysis is facilitated, since the system can be modeled as 
a loss network I?), for which powerful theoretical results are 
available; and finally, as our results show, simple placement 
strategies ensure optimal operation in the present model. 

In the P2P system we are considering, there are two kinds 
of peers: boxes and pure users. Their difference is that boxes 
do contribute resources (storage space and uplink bandwidth) 
to the system, while pure users do not. This paper focuses on 
the following two architectures (illustrated in Figure [T]): 

• Distributed Server Network (DSN): Requests to down- 
load contents come only from pure users, and can be 
regarded as external requests. 

• Pure P2P Network (PP2PN): There are no pure users 
in the system, and boxes do generate content requests, 
which can be regarded as "internal". 

The rest of the paper is organized as follows: We review 
related work in Section and introduce our system model in 
Section |III] For the Distributed Server Network scenario, the 
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so-called "proportional-to-product" content placement strategy 
is introduced and shown to be optimal in a large system limit 
in Section |IV] where extensive simulation results are also pro- 
vided. For the Pure P2P Network scenario, a distinct placement 
strategy is introduced and proved optimal in Section |V] These 
results apply for a catalogue of contents of limited size. An 
alternative model in which catalogue size grows with the user 
population is introduced in Section IVll where it is shown 
that the "proportional-to-product" placement strategy remains 
optimal in the DSN scenario in this large catalogue setting, 
for a suitably modified request management technique. 

11. Related Work 

The number and location of replicas of distinct content 
objects in a P2P system have a strong impact on such system's 
performance. Indeed, together with the strategy for handling 
incoming requests, they determine whether such requests 
must either be delayed, or served from an alternative, more 
expensive source such as a remote data center Requests which 
cannot start service at once can either be enqueued (we then 
speak of a waiting model) or redirected (we then speak of a 
loss model). 

Previous investigations of content placement for P2P VoD 
systems were conducted by Suh et al. ifTTI . The problem tack- 
led in fm differs from our current perspective, in particular no 
optimization of placement with respect to content popularity 
was attempted in this work. Performance analysis of both 
queueing and loss models are considered in ifTTI . Valancius 
et al. iflTl considered content placement dependent on content 
popularity, based on a heuristic linear program, and validated 
this heuristic 's performance in a loss model via simulations. 

Tewari and Kleinrock lfT4l . ifTSll advocated to tune the 
number of replicas in proportion to the request rate of the 
corresponding content, based on a simple queueing formula, 
for a waiting model, and also from the standpoint of the load 
on network links. They further established via simulations that 
Least Recently Used (LRU) storage management policies at 
peers emulated rather well their proposed allocation. 

Wu et al. lITSl considered a loss model, and a specific time- 
slotted mode of operation whereby requests are submitted 
to randomly selected peers, who accommodate a randomly 
selected request. They showed that in this setup the optimal 
cache update strategy can be expressed as a dynamic program. 
Through experiments, they established that simple mecha- 
nisms such as LRU or Least Frequently Used (LFU) perform 
close to the optimal strategy they had previously characterized. 

Kangasharju et al. |6] addressed file replication in an envi- 
ronment where peers are intermittently available, with the aim 
of maximizing the probability of a requested file being present 
at an available peer This differs from our present focus in that 
the bandwidth limitation of peers is not taken into account, 
while the emphasis is on their intermittent presence. They 
established optimality of content replication in proportion to 
the logarithm of its popularity, and identified simple heuristics 
approaching this. 



Boufkhad et al. lO considered P2P VoD from yet another 
viewpoint, looking at the number of contents that can be 
simultaneously served by a collection of peers. 

Content placement problem has also been addressed towards 
other different optimization objectives. For example, Almeida 
et al. m aim at minimizing total delivery cost in the network, 
and Zhou et al. |fT9ll target jointly maximizing the average 
encoding bit rate and average number of content replicas 
as well as minimizing the communication load imbalance of 
video servers. 

Cache dimensioning problem is considered in ||9l, where 
Laoutaris et al. optimized the storage capacity allocation 
for content distribution networks under a limited total cache 
storage budget, so as to reduce average fetch distance for 
the request contents with consideration of load balancing and 
workload constraints on a given node. Our paper takes a 
different perspective, focusing on many-user asymptotics so 
the results show that the finite storage capacity per node is 
never a bottleneck (even in the "large catalogue model", it 
also scales to infinity more slowly than the system size). 

There are obvious similarities between our present objective 
and the above works. However, none of these identifies explicit 
content placement strategies at the level of the individual peers, 
which lead to minimal fraction of redirected (lost) requests in 
a setup with dynamic arrivals of requests. 

Finally, there is a rich literature on loss networks (see in 
particular Kelly ||7]); however our present concern of optimiz- 
ing placement to minimize the amount of rejected traffic in a 
corresponding loss network appears new. 



III. Model Description 

We now introduce our mathematical model and related 
notations. Denote the set of all boxes as B. Let \B\ = B and 
index the boxes from 1 to B. Box b has a local cache that 
can store up to A! contents, all boxes having the same storage 
space Al. We further assume that each box can simultaneously 
serve U concurrent requests, where U is an integer, i.e., each 
box has an uplink bandwidth equal to U times the video 
streaming rate. In particular we assume identical streaming 
rates for all contents. 

The set of available contents is defined as C. Let |C| = C 
and index contents from 1 to C. Thus a given box b will be 
able to serve requests for content c for all c E Jb- 

In a Pure P2P Network, when box b has a request for 
a certain content c, which is coincidentally already in its 
cache, a "local service" is provided and no download service 
is needed, hence the service to this request consumes no 
bandwidth resource. The effect of local service on deriving 
an optimal content placement strategy will be discussed in 
detail in Section [V] 

In a Distributed Server Network, however, local service will 
never occur since all the requests are external with respect to 
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the system resource^ 

For a new request that needs a download service, an 
attempt is made to serve this request by some box holding 
content c, while ensuring that previously accepted requests 
can themselves be assigned to adequate boxes, given the cache 
content and bandwidth resources of all boxes. This potentially 
involves "repacking" of requests, i.e., reallocation of all the 
bandwidth resources in the system ("box-serving-request" 
mapping) to accommodate this new download demand pattern. 
If such repacking can be found, then the request is accepted; 
otherwise, it is rejected from the P2P system. 

It will be useful in the sequel to characterize the concurrent 
numbers of requests that are amenable to such repacking. Let 
n = {nc}cec be the vector of numbers ric of requests per 
content c. Clearly, a matching of these requests to server boxes 
is feasible if and only if there exist nonnegative integers Zcb 
(number of concurrent downloads of content c from box b) 
such that 

Zcb = Uc, y c eC; 

b-.c&Jh 

^ z,, < [/, V 6 e (1) 

c:ceJb 

A more compact characterization of feasibility follows by an 
application of Hall's theorem |2l (detailed in Appendix |B]i, 
giving that n is feasible if and only if: 

V 5 C C, ^ < [/ |{6 e 6 : 5 n Jb 7^ 0}| . (2) 

We now introduce statistical assumptions on request arrivals 
and durations. New requests for content c occur at the instants 
of a Poisson process with rate Vc- We assume that the video 
streaming rate is normalized to 1, and is the same for all 
contents. We further assume that all videos have the same 
duration, again normalized at 1 . Under these assumptions, the 
amount of work per time unit brought into the system by 
content c equals Vc- 

With the above assumptions at hand, assuming fixed cache 
contents, the vector n of requests under service is a particular 
instance of a general stochastic process known as a loss 
network model. Loss networks were introduced to represent 
ongoing calls in telephone networks, and exhibit rich structure. 
In particular, the corresponding stochastic process is reversible, 
and admits a closed-form stationary distribution. For the 
Distributed Server Network model, the stationary distribution 
reads; 

1 T-r v'^" 

7r(n) = ^ n ^-^{n is feasible}- (3) 

In words, the numbers of requests ric are independent Poisson 
random variables with parameter Vc, conditioned on feasibility 
of the whole vector n. 

^In fact the external users issuing requests could keep local copies of 
previously accessed content, and hence experience "local service" upon re- 
accessing the same content. But we do not need consider this as this happens 
outside the perimeter of our system. 



Our objective is then to determine content placement strate- 
gies so that in the corresponding loss network model, the 
fraction of rejected requests is minimal. The difficulty in doing 
this analysis resides in the fact that the normalizing constant Z 
is cumbersome to evaluate. Nevertheless, simplifications occur 
under large system asymptotics, which we will exploit in the 
next sections. 

We conclude this section by the following remark. For sim- 
plicity we assumed in the above description that a particular 
content is either fully repUcated at a peer, or not present at 
all, and that a request is served from only one peer. It should 
however be noted that we can equally assume that contents 
are split into sub-units, which can be placed onto distinct 
peers, and downloaded from such distinct peers in parallel 
sub-streams in order to satisfy a request. This extension is 
detailed in Appendix |F] 

IV. Optimal Content Placement in Distributed 
Server Networks 

We first describe a simple adaptive cache update strategy 
driven by demand, and show why it converges to a "prede- 
termined" content placement called "proportional-to-product" 
strategy. We then establish the optimality of this "proportional- 
to-product" placement in a large system asymptotic regime. 

A. The Proportional-to-Product Placement Strategy 

A simple method to adaptively update the caches at boxes 
driven by demand is described as follows; 



Demand-Driven Cache Update 



Whenever a new request comes, with probability eB (e is 
chosen such that eB < 1), the server picks a box b uniformly at 
random, and attempts to push content c into this box's cache. If 
c is already in there, do nothing; otherwise, remove a content 
selected uniformly at random from the cache. 



Since external demands for content c are according to a 
Poisson process with rate I'c, we find that under the above 
simple strategy, content c is pushed at rate ei/c into a particular 
box which is not caching content c. Recall that each box 
stores M distinct contents, and let j denote a candidate "cache 
state", which is a size M subset of the full content set C. For 
convenience, let J' denote the collection of all such j. 

With the above strategy, the caches at each box evolve 
independently according to a continuous-time Markov process. 
The rate at which cache state j is changed to j', where 
j' = j + {c} \ {d} for some contents d G j, c ^ j, which 
we denote by q{j,j'), is easily seen to be q{j,j') = eVc/M. 
Indeed, content d is evicted with probability 1/M, while 
content c is introduced at rate evc- 

It is easy to verify that the distribution p(-) given by 

P{i) = ^\{^c. 3^J. (4) 
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for some suitable normalizing constant Z, verifies the foUwing 
equation: 

pOXj,/) = j), J, / G J- (5) 

The latter relations, known as the local balance equations, 
readily imply that p(-) is a stationary distribution for the above 
Markov process; since the process is irreducible, this is the 
unique stationary distribution. 

Thus, we can conclude that under this cache update 
strategy, the random cache state at any box eventually follows 
this stationary distribution. This is what we refer to as the 
"proportional-to-product" placement strategy, and it is the 
one we advocate in the Distributed Server Network scenario. 

Remark 1: The customized parameter e should not be too 
large, otherwise the burden on the server will be increased due 
to use of "push". Neither should it be too small, otherwise the 
Markov chain will converge too slowly to the steady state, o 

Under the cache update strategy, the distribution of cache 
contents needs time to converge to the steady state. However, 
if we have a priori information about content popularity, we 
can use a sampling strategy as an alternative way to directly 
generate proportional-to-product content placement in one go. 
One method works as follows: 



Sampling-Based Preallocation 



Select successively M contents at random in an i.i.d. fash- 
ion, according to the probability distribution \vc]^ where 
Vc ~ ^c/^c'ec '^c is the normalized popularity. If there are 
duplicate selections of some content, re-run the procedure. 
It is readily seen that this yields a sample with the desired 
distribution. 



An alternative sampling strategy which can be faster than 
the one described above when very popular items are present 
is given in the Appendix ICl 

B. A Loss Network Under Many-User Asymptotics 

We now consider the asymptotic regime called "many user- 
fixed catalogue" scaling: The number of boxes B goes to 
infinity. The system load, defined as 



is assumed to remain fixed, which is achieved in the present 
section by assuming that the content collection C is kept fixed, 
while the individual rates {vc] scale linearly with B. We also 
assume that the normalized content popularities {vc} remain 
fixed as B increases. It thus holds that Vc = VcpBU for all 
c G C Note that although boxes are pure resources rather than 
users, scaling of {vc] with B to infinity actually indicates a 
"many-user" scenario. 

To analyze the performance of our proposed proportional- 
to-product strategy, we require that the cache contents are sam- 
pled at random according to this strategy and are subsequently 



kept fixed. This can either reflect the situation where we use 
the previously introduced sampling strategy, or alternatively 
the situation where the cache update strategy has already made 
the distribution of cache states converge to the steady state, and 
occurs at a slower time scale than that at which new requests 
arise and complete. 

Note that, as B grows large, the right-hand side in the 
feasibility constraint (Hjl verifies, by the strong law of large 
numbers. 

Here, {raj} corresponds to a particular content placement 
strategy, under which each box holds a size M content set 
j with probability iiij, and this happens independently over 
boxes. Specifically, nij = Yicej (where Z is a nor- 
malizing constant) corresponds to our proportional-to-product 
placement strategy. 

We now establish a sequence of loss networks indexed by 
a large parameter B. For the B*"^ loss network, requests for 
content c G C (regarded as "calls of type c") arrive at rate 
t'c'' = [pUvc) ■ B, each "virtual link" S C C has a capacity 

W^">^{U ^ m,)-B, (8) 

and c £ S represents that virtual link S is part of the "route" 
which serves call of type c0 This particular setup has been 
identified as the "large capacity network scaling" in Kelly Q. 
There, it is shown that the loss probabilities in the limiting 
regime where i? — > oo can be characterized via the analysis 
of an associated variational problem. 

We now describe the corresponding results in fT] 
relevant to our present purpose. For the B^^ loss 
network, consider the problem of finding the mode of 
the stationary distribution (|3]l, which corresponds to 
maximizing X]cec('^c'' ^ogiy^f — logn^f'!) over feasible nw. 
Then, approximate log ric^' ! by n^f' log njf' — nf according 
to Stirling's formula and replace the integer vector n(^' 
by a real-valued vector xw. This leads to the following 
optimization problem: 

[OPT 1] 

max y {xf log - xf' log ' + > ) (9) 

cSC 

s.t. y sec, y xf < w|" (10) 

over x<^' > 0. 

'Note that this construction in fact admits a form of fixed routing which is 
equivalently transformed from a dynamic routing model where each particular 
box is regarded as a link and calls of type c can use any single-link route 
con'esponding to a box holding content c. This equivalent transforin is based 
on the assumption that repacking is allowed (cf. Section 3.3. in [7]). We have 
already found this equivalent transform by converting feasibility condition {T) 
to (2) in Section [m] 
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The corresponding Lagrangian is given by: 

L(x(-> , y w ) = ^ (.Tf ' log z^f ' - ' log X?' + ' ) 

cec 

sec ce5 

where {yif'lscc Lagrangian multipliers. The KKT con- 
ditions for this convex optimization problem comprise the 
original constraints and the following ones: 

y(B) (^1^) _ ^(B) ) ^ 0, yf >o, y sec, 

—^5) = log i^^ ' - log ' - 2^ yf =0, V c e C 

(11) 

where (x<«) , y'^^' ) is a solution to the optimization problem. 
From equation (fTTT i. we further get 

xf = i/f' exp(- ^ yf)), V c G C. (12) 

Then the result that we will need from Kelly fl] is the fol- 
lowing: for the loss network, the steady state probability 
of accepting request for c, denoted by Af , verifies 

= exp ( - ^ yf J + O (5-5) , V c e C, (13) 
\ s-.ces ) 

where y^' are the Lagrangian multipliers of the previous 
optimization problem. 

C. Optimality of Proportional-to-Product Content Placement 
Note that the global acceptance probability, denoted by 
Asys, which also reads Asys = Scec ^cAc, cannot exceed 
min(l,l/p). Indeed, it is clearly no larger than 1. It cannot 
exceed \/p either, otherwise the system would treat more 
requests than its available resources. 

We now prove that the proportional-to-product content 
placement not only achieves the optimal global acceptance 
probability Agys = min(l, l/p), but also achieves fair 
individual acceptance probabilities, i.e., Ac = Agys for all c. 
More precisely, we have the following theorem: 

Theorem 1: By using rrij = Ylcej '^c/Z for all j C C 
s.t. \j\ ~ M, where Z is the normalizing constant, we have 
lims^oo ^c" ~ min{l, 1/p}, Vc E C, for fixed p and C. o 

Before giving the proof, we comment on the result. One 
point to note is that because of (|7]i, the above optimal 
acceptance rate is achieved with probability one under 
any random sampling which follows the proportional-to- 
product scheme. Secondly, the optimality of the asymptotic 
acceptance probability does not depend on M, as long as 
M > 1. Thus for this particular scaling regime, storage space 
is not a bottleneck. As we shall see in the next two sections, 
increasing M does improve performance if either local 
services occur, as in the Pure P2P Network scenario (Section 



4), or if the catalogue size C scales with the box population 
size B, a case not covered by the classical literature on loss 
networks, and to which we turn in Section IVI-BI 

Proof: First, we consider p> 1- Letting 

exp ( - ^ yf> j = 1/p, Vc e C, (14) 
\ Sices / 

we have 

VceC, yf ^log/'- (15) 

S:ceS 

Putting equation (flSl l into ( fT2] i leads to 

VceC, ^v'P/p. 
Thus, inequality dTol i in OPT 1 becomes 

V5CC, ^vf^<p E T^jBU. (16) 

ces j:jns^ii 

Since z^c^' = pBU • Vc and X)cec ^'^ ~ 1' inequality (fT6l l 
further becomes, upon explicitly writing out the normalization 
constant Z: 

V5CC, E^c'E n^-^E^-'E \{^c. ill) 

ISI = A-f sec 

\g\=M 

Two types of product terms (mapped to subsets /C C C) appear 
on both sides: 

I- ricGK^c: \}C\=M + l, JCnS^d). 

n. (UceK ^c) ■ Vc'-. cJ eicns, |/C| = m. 

To show whether inequality ( fTTI i hold, we only have to prove 
that given any S C C, for each product term (related to a K.) 
which appears in one inequality corresponding to a certain S, 
its multiplicity on the left hand side is no more than that on 
the right hand side. 

1 . For a product term of Type I: 

• On the LHS: Since IlceK; ^ Ilcee ' ^'^^ 
some Q C C and c' E S O IC, where Q is a size M 
content set, c' ^ G, and K, ^ Q + {c'}. It is easy to 
see that we have |5 n /C| different choice of c' in 
a JC, so the multiplicity of this product term on the 
LHS equals |5nA:|. 
. On the RHS: When \S n IC\ > 2, for any c' e /C, 
/C\{c'} is a size M content set of which the intersect 
with S is not empty, hence the multiplicity equals 
|/C| (= M + 1). When |5nA:| = 1, the exception to 
the above case is that if c' G 5n/C, then /C\ {c'} is 
a size M content set which has no intersect with S 
and is actually impossible to appear in the second 
summation term (over all size M content sets Q s.t. 
tjHiS 7^ 0) in inequality (fTTI i. Thus, the multiplicity 
equals |/C| - 1 M). 
From above, we can see that the multiplicity of the 
product term on the LHS is always no more than that 
on the RHS. 
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2. For a product term of Type II: 

JC is actually already a size M content set Q s.t. tJnC 7^ 
0. Therefore, it is easy to see that on both sides, the 
multiplicities of this product term are both 1. 
Now we can conclude that inequality ( fTTb holds for all S QC, 
and continue to check the complementary slackness. Given 
p>\, one simple solution to equation ( fTSb reads: 



V5CC, ^(f) =logp-I 



{S = C} ■ 



(18) 



Besides, inequality ( [TtI i is tight for S — C (we even do not 
need to check this when p = 1). Therefore, complementary 
slackness is always satisfied with solution (fTsl l. 

So far we have proved that the KKT condition holds when 
p > 1. When p < 1, we modify (fl4l i by letting 



exp 



S:ceS 



y 



= 1, Vce C, 



(19) 



and hence there is an additional factor l/p > 1 on the RHS 
of inequality ( [TtI i. Since the old version of inequalities ( [TtI i is 
proved to hold, the new version automatically holds, but none 
of them is tight now. However, from il9[ we have yf = 
0, V 5 C C, which means complementary slackness is always 
satisfied (similar to p = 1). 

Therefore, according to equation (l3[ . it can be concluded 
that by using mj = Yicej '^c/Z for all j, we can achieve 

min{l, 1/p} + O (s^^) , Vc e C, 

so lims_>oo Af = min{l, 1/p}. ■ 
D. Simulation Results 

In this subsection, we use extensive simulations to evaluate 
the performances of the two implementable schemes proposed 
in Subsection llV-Al which follow the "proportional-to-product" 
placement strategy, namely the sampling-based preallocation 
scheme and the demand-driven cache update (labeled as 
"SAMP" and "CU", respectively). 

We compare the results with the theoretical optimum (i.e., 
loss rate for each content equals (1 — 1/p)^; the curves 
are labeled as "Optimal") and a uniform placement strategy 
(labeled as "UNIF") defined as the following: first, permute 
all the contents uniformly at random, resulting in a content 
sequence {c,}, for 1 < i < C; then, push the M contents 
indexed by subsequence {c(j mod c)}6A/+i<j<(6+i)a/ into 
the cache of box h, for 1 < b < B. UNIF is also used to 
generate the initial content placement for CU so that the loss 
rate can be reduced during the warm-up period. 

If not further specified, the default parameter setting is as 
follows: The popularity of contents {vc} follows a zipf-like 
distribution (see e.g. ||4l), i.e., 

(co + c)-" 



Vr 



(20) 



EC'6C(C0+C')-"' 

with a decaying factor a > and the shift co > 0. We use 
a ~ 0.8 and cq ~ 0. The content catalogue size C = 500 and 
the number of boxes B = 4000. Each box can store M = 
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Fig. 2: System loss rates under different traffic loads 



10 contents and serve at most L/ = 4 concurrent requests. 
The duration of downloading each content is exponentially 
distributed with mean equal to 1 time unit. The parameter e 
in the cache update algorithm is set as such that upon a 
request, one box will definitely be chosen for cache update. 

For every algorithm, we take the average over 10 indepen- 
dent repetitive experiments, each of which is observed for 10 
time units. According to the sample path, the initial 1/5 of the 
whole period is regarded as a "warm-up" period and hence 
ignored in the calculation of final statisticsO 

Some implementation details are not captured by our theo- 
retical model, but should be considered in simulations. Upon 
a request arrival, the most idle box (i.e., with the largest 
number of free connections) among all the boxes which hold 
the requested content is chosen to provide the service, for the 
purpose of load balancing. If none of them is idle, we use a 
heuristic repacking algorithm which iteratively reallocates the 
ongoing services among boxes, in order to handle as many 
requests as possible while still respects load balancing. One 
important parameter which trades off the repacking complexity 
and the performance is the maximum number of iterations 
imax^ which is set as "undefined" by default (i.e., the iterations 
will continue until the algorithm terminates; theoretically there 
are at most C iterations). Other details regarding the repacking 
algorithm can be found in Appendix |D] We will see an 
interesting observation about t™"^ later 

Figure |2] evaluates system loss rates under different traffic 
loads p. Our two algorithms SAMP and CU, which tar- 
get the proportional-to-product placement, both match the 
theoretically optimum very well! On the other hand, the 
UNIF algorithm, which does not utilize any information about 
content popularity, incurs a large loss even if the system is 
underloaded {p < 1). The gain of proportional-to-product 
placement over UNIF becomes less significant as the traffic 

''We can get enough samples during each observation period of 10 time 
units (for example, when p = I, B = 4000 and U = A, the average arrivals 
would be 160000). It has also been checked that after the warm-up period, 
the distribution of cache states well approximates the proportional-to-product 
placement and is kept quite stably for the remaining observation period. 

^In fact, around p = 1, they perform a little worse than the optimum. The 
reason is that p = 1 is the "critical traffic load" (a separation point between 
zero-loss and nonzero-loss ranges), under which the simulation results are 
easier to incur deviation from the theoretical value. 
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Fig. 3: System loss rates with different a (p ~ 1) 
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Fig. 4; Effect of repacking on the system loss rate 



load grows, which can be easily expected. 

In Figure [3] when the decaying factor a in the zipf- 
like distribution increases, the distribution of placed contents 
generated by UNIF has a higher discrepancy from the real 
content popularity distribution, so UNIF performs worse. On 
the other hand, the two proportional-to-product strategies are 
insensitive to the change of content popularity, as we expected. 

Figure |4] shows the effect of repacking on the system loss 
rate. In sub-figure (a), we find that under SAMP, repacking is 
not necessary. In sub-figure (b) which shows the performances 
of CU, when p is low, one iteration of repacking is sufficient 
to make the performance close enough to the optimum; when 
p is high, repacking also becomes unnecessary. The main take- 
away message from this figure is that we can execute a repack- 
ing procedure of very small complexity without sacrificing 
much performance. The reason is that when the server picks 
a box to serve a request, it already respects the rule of load 
balancing. 

We then explain why CU still needs one iteration of 
repacking to improve the performance when p is low. Note 
that during the cache update, it is possible that the box is 
currently uploading the "to-be-kicked-out" content to some 
users. If repacking is enabled, those ongoing services can be 
repacked to other boxes (see details in Appendix |D]i, but if 
finax _ Q (-jjQ repacking), they will be terminated and counted 
as losses. When p is high, however, boxes are more likely to 
be busy, which leads to the failure of repacking, so repacking 
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Fig. 5: System loss rates with different number of boxes 
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Fig. 6: Loss rate of requests for each content (p = 1) 



makes no difference. 

Recall that the proportional-to-product placement is only 
optimal when the number of boxes i? — > oo. Figures |5] and 
|6] then show the impact of a finite B. In Figure |5] as i? 
decreases, the system loss rate of every algorithms increases 
(compared to the two proportional-to-product strategies, UNIF 
is less sensitive to B). In Figure |6] non-homogeneity in the 
individual loss rates of requests for each content also reflects 
a deviation from the theoretical result (when B oo, the 
loss rates of the requests for all the contents are proved to be 
identical). As expected, increasing the number of boxes (from 
4000 to 8000) makes the system closer to the limiting scenario 
and the individual loss rates more homogeneous. Another 
observation is that as the popularity of a content decreases (in 
the figure, the contents are indexed in the descending order of 
their popularity), the individual loss rate increases. However, 
according to Figured those less popular contents do not affect 
the system loss rate much even if they incur high loss, since 
their weights {Dc} are also lower. 

In fact, if we choose a smaller content catalogue size C or 
a larger cache size M, simulations show the negative impact 
of a finite B will be reduced (the figures are omitted here). 
This tells us that if C scales with B rather than being fixed, 
the proof of optimality under the loss network framework in 
Subsection llV-Bl is no longer valid and M must be a bottleneck 
against the performance of the optimal algorithm. We will 
solve this problem by introducing a certain type of "large 
catalogue model" later in Section [VTl 
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V. Optimal Content Placement in Pure 
Peer-to-Peer Networks 

In the Pure P2P Network scenario, when box b has a request 
for content c which is currently in its own cache, a "local 
service" will be provided and no download bandwidth in the 
network will be consumed. To simplify our analysis, each 
request for a specific content is assumed to originate from 
a box chosen uniformly at random (this in particular assumes 
identical tastes of all users). 

This means that the effective arrival rate of the requests for 
content c which generates traffic load actually equals i>c — 
Vc{l — fhc), where ific is defined as the fraction of boxes who 
have cached content c. Let pc — pvc denote the traffic load 
generated by requests for content c, and Ac denote the fraction 
of the system bandwidth resources used to serve requests for 
content c. Obviously, X^cec < 1- The traffic load absorbed 
by the P2P system either via local services or via service from 
another box is then upper-bounded by 

/5 = ^/Oc^c + [Pc(l - TOc)] A Ac, (21) 

cec 

where "A" denotes the minimum operator. 

We will use this simple upper bound to identify an op- 
timal placement strategy in the present Pure P2P Network 
scenario. To this end, we shall establish that our candidate 
placement strategy asymptotically achieves this performance 
bound, namely absorbs a portion p in the limit where B tends 
to infinity. 

To find the optimal strategy, we introduce a variable 
Xc = [Pc(l — "T-c)] A Ac for all c. Note further that the fraction 
Ac is necessarily bounded from above by rhc, as only those 
boxes holding c can devote their bandwidth to serving c. It 
is then easy to see that the quantity p in (ISTT i is no larger 
than the optimal value of the following linear programming 
problem: 

[OPT 2] 

max y^(/5c"^c + Xc) 

m,A,x ^ — ^ 

cec 

s.t. y C Cz C, < rhc < I, < -^c < TTT-c] 

y C ^ C, < Xc < Xc, Xc < /5c(l — TOc); 

= M, ^Ac < 1. 

The following theorem gives the structure of an optimal 
solution to OPT 2, and as a result suggests an optimal 
placement strategy. 

Theorem 2: Assume that {vc] are ranked in descending 
order The following solution solves OPT 2: 

. For 1 < c < A/ - 1, mc = 1, Ac = a;c = 0. 
• For M < c < c*, rhc = Xc ^ Xc = Pc/{^ + Pc), where 
c* satisfies that 

c* c* + l 

y < 1, but y > 1. 

^ 1+ Pc ~ ^ 1+ Pc 

c=M ' " c=M 



« For c = c* + 1, rhc = Ac = Xc = 1 — X]c=a/ "^c- 

« For c* + 2 < c < C, iJic = Ac = Xc = 0. o 

The proof consists in checking that the KKT conditions 

are met for the above candidate solution. Details are given in 

Appendix |E] 

The above optimal solution suggests the following place- 
ment strategy: 



"Hot- Warm-Cold" Content Placement Strategy 



Divide the contents into three different classes according to 
their popularity ranking (in descending order): 

• Hot: The M — 1 most popular contents. At each box, 
M — 1 cache slots are reserved for them to make sure 
that requests for these contents are always met via local 
service. 

• Warm: The contents with indices from AI to c* + 1 (or 
c* if '^c=M "^c = !)• For these contents, a fraction rhc 
of all the boxes will store content c in their remaining one 
cache slots, where the value of rhc is given in Theorem|2] 
All requests for these contents (except c* + 1 if it is 
classified as "warm") can be served, at the expense of all 
bandwidth resources. 

• Cold: The other less popular contents are not cached at 
all. 



Remark 2: The requests for the c* most popular contents 
("hot" contents and "warm" contents except content c* + 1) 
incur zero loss, while the requests for the C — c* — 1 least 
popular contents incur 100% loss. There is a partial loss in 
the requests for content c* + 1 if YTc=m ™c < 1- 

Note that the placement for "warm" contents looks like the 
"water-filling" solution in the problem of allocating transmis- 
sion powers onto different OFDM channels to maximize the 
overall achievable channel capacity in the context of wireless 
communications ||T6| . o 

Under this placement strategy, the maximum upper bound 
on the absorbed traffic load reads 

p-Epc + (pc^.i + i) (i-Eyt^). 

c=l V c^M^^P") 

We then have the following corollary: 

Corollary 1: Considering the large system limit B ^ oo, 
with fixed catalogue and associated normalized popularities 
{j/c} as considered in Subsection IIV-BI the proposed "hot- 
warm-cold" placement strategy achieves an asymptotic frac- 
tion of absorbed load equal to the above upper bound p, and 
is hence optimal in this sense. o 

Proof: With the proposed placement strategy, hot (respec- 
tively, cold) contents never trigger accepted requests, since all 
incoming requests are handled by local service (respectively, 
rejected). For warm contents, because each box holds only one 
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warm content, it can only handle requests for that particular 
warm content. As a result, the processes of ongoing requests 
for distinct warm contents evolve independently of one an- 
other For a given warm content c, the corresponding number 
of ongoing requests behaves as a simple one-dimensional loss 
network with arrival rate Vc{l — rhc) and service capacity 
rhcBU . For c = Af, . . . , c*, one has rhc = Pel (1 + Pc) where 
Pc = Vc/{BU), so both the arrival rate and the capacity of 
the corresponding loss network equal fhcBU . The asymptotic 
acceptance probability B ^ oo then converges to 1 and 
the accepted load due to both local service and services 
from other boxes converges to p^- For content c* + 1 (if 
rhc'+i > 0), the corresponding loss network has arrival rate 
i'c-'+i{i~rnc*+i) and service capacity rhc'+iBU . Then, in the 
limit B — > oo, the accepted load (due to both local services 
and services from other boxes) reads pc-+irhc*+i + ??ic*+i 
(which is actually smaller than pc*+i). Summing the accepted 
loads of all contents yields the result. ■ 



bottleneck? Is the proportional-to-product placement strategy 
still optimal under the large-catalogue scaling? 

A. Necessity of Unbounded Storage 

We first establish that bounded storage will strictly 
constrain utilization of bandwidth resources. To this end we 
need the following lemma: 

Lemma 1: Consider the system under large catalogue scal- 
ing, with fixed weights a; and cache size M per box. Define 
M' = [2A//a] . Then 

(i) More than half of the contents are replicated at most M' 
times, and 

(ii) For each of these contents, the loss probability is at least 
Eiinii z^j, M'U) > 0, where •) is the Erlang function Q 
defined as: 



CI 



.n=l 



VI. Large Catalogue Model 

Keeping the many-user asymptotic, we now consider an 
alternative model of content catalogue, which we term the 
"large catalogue" scenario. The set of contents C is divided 
into a fixed number of "content classes", indexed hy i E I. 
In class i, all the contents have the same popularity (arrival 
rate) i^i. The number of contents within class i is assumed 
to scale in proportion to the number of boxes B, i.e., class i 
contains aiB contents for some fixed scaling factor a^. We 
further define a = cti- With the above assumptions, the 
system traffic load p in equation (|6]l reads 



(22) 



The primary motivation for this model is mathematical conve- 
nience: by limiting the number of popularity values we limit 
the "dimensionality" of the request distribution, even though 
we now allow for a growing number of contents. It can also be 
justified as an approximation, that would result from batching 
into a single class all contents with a comparable popularity. 
Such classes can also capture the movie type (e.g. thriller, 
comedy) and age (assuming popularity decreases with content 
age). 

We use Vi to denote the normalized popularity of content 
class i E I and it reads ^ 



1. It is reasonable to regard 
each Vi as fixed. Oi ^ Vi/{aiB) represents the normalized 
popularity of a specific content in class i, which decreases as 
the number of contents in this class aiB increases, since users 
now have more choices within each class. In practice, an online 
video provider company which uses the Distributed Server 
Network architecture adds both boxes and available movies of 
each type to attract more user traffic, under a constraint of a 
maximum tolerable traffic load p. 

Returning to the Distributed Server Network model of 
Section HVl we consider the following questions: What amount 
of storage is required to ensure that memory space is not a 



Proof: We first prove part (i). Note that the total number 
of content replicas in the system equals BAI . Thus, denoting 
by / the fraction of contents replicated at least M' + 1 times, 
it follows that faB{M' + 1) < BM, which in turn yields 



/< 



M 



< 



M 



2M 



1 

<2' 



a{\2M/a] + 1) 

which implies statement (i). 

To prove part (ii), we establish the following general prop- 
erty for a loss network (equivalent to our original system) with 
call types j e J', corresponding arrival rates lyj, and capacity 
(maximal number of competing calls) Ci on link £ for all 
£ G C. We use i E j to indicate that the route for calls of type 
j comprises link £. Denoting the loss probability of calls of 
type j in such a loss network as pj, we then want to prove 

p,>E{iy,,C;), (23) 

where Cj = min^gj Ci, i.e., the capacity of the bottleneck 
link on the route for calls of type j. 

Note that the RHS of the above inequality is actually the 
loss probability of a loss network with only calls of type j 
and capacity Cj. Fixing index j, we define this loss network 
as an auxiliary system and consider the following coupling 
construction which allows us to deduce inequality ( |23] |: Let Xk 
be the number of active calls of type k in the original system 
for all k, and let Xj denote the number of active calls of type 
j in the auxiliary system. Initially, Xj{0) = Xj{0). The non- 
zero transition rates for the joint process {{Xk}kGK , X'^) are 
given by 



k^j: Xk~^ Xk + 1 



k^j: Xk 
{X,,X'^)^ 
iX„X'^)^ 

iXj,X'^)^ 
{X,,X'^)^ 



- 1 
1 
1 



-^Xk 

(Xj - 1, 
iX,,X!^ 



at rate JJT{j-^ 

eej 

at rate Xk, 



+ 1) 

- 1) 



at rate v] 
at rate iy° 



both 



at rate 
at rate X.j 



at rate 
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where 

aux A ,, T T 

It follows from Theorem 8.4 in [5) that {X^} is indeed a loss 
network process with the original dynamics, and that is 
a one-dimensional loss network with capacity Cj and arrival 
rate lyj. From the construction, we can see that all transitions 
preserve the inequality Xj (t) < Xj (t) for all t > 0, due to the 
following reason: Once Xj increases by 1, X!j either increases 
by 1 or equals the capacity limit Cj, and for the latter case, the 
corresponding transition rate i^j" implies that Xj < Cj = Xj. 
Similarly, once Xj decreases by 1, either Xj also decreases 
by 1, or in the case that Xj does not decrease, it must be that 
the transition rate Xj — Xj is strictly positive. In any case, the 
above inequality is preserved. 

We further let Aj{t), A'j{t) denote the number of type j 
external calls, Lj(t), L'j{t) the number of type j call rejec- 
tions, and Dj{t), Dj{t) the number of type j call completions, 
respectively in the original and auxiliary systems, during time 
interval [0,t]. It follows from our construction that whenever 
the service for a call of type j completes in the original 
system, the service for a call of type j also completes in the 
auxiliary system, hence Dj(t) < Dj(t) for all t > 0. Since 
X,it) = A,{t)-D,{t)-L,{t), X'^it) = A'^(t)-Dr{t)-Lr{t) 
and Aj{t) = A'^it), we have Lj{t) > L'j{t). Upon dividing 
this inequality by A{t) and letting t tend to infinity, one 
retrieves the announced inequality (l23T l by the ergodic theorem. 

Back to the context of our P2P system, for those contents 
which are replicated at most M' times (i.e., the contents 
considered in part (i)), the rejection rate of content c of type 
j reads pj > E{M, v„ Cj) > £;(inf,; Vi.M'U). ■ 

The above lemma readily implies the following corollary: 
Corollary 2: Under the assumptions in Lemma[T] The over- 
all rejection probability is at least ii?(mini i^;, M'U). Indeed, 
for bounded M, M' is also bounded, and E (mini i^i, M'U) 
is bounded away from 0. o 
Thus, even when the system load p is strictly less than 1, 
with bounded M there is a non- vanishing fraction of rejected 
requests, hence a suboptimal use of bandwidth. 

B. Efficiency of Proportional-to-Product Placement 

We consider the following "Modified Proportional-to- 
Product Placement": Each of the M storage slots at a given 
box h contains a randomly chosen content. The probability of 
selecting one particular content c is Vi/{pBU) if it belongs to 
class i. In addition, we assume that the selections for all such 
MB storage slots are done independently of one another 

Remark 3: This content placement strategy can be viewed 
as a "balls-and-bins" experiment. All the AIB cache slots in 



the system are regarded as balls, and all the \C\ (= 
contents are regarded as bins. We throw each of the MB 
balls at random among all the \C\ bins. Bin c (corresponding 
to content c which belongs to class i) will be chosen with 
probability Vi/{pBU). Alternatively, the resulting allocation 
can be viewed as a bipartite random graph connecting boxes 
to contents. o 

Note that this strategy differs from the "proportional-to- 
product" placement strategy proposed in Section |iy] in that 
it allows for multiple copies of the same content at the same 
box. However, by the birthday paradox, we can prove the 
following lemma which shows that up to a negligible fraction 
of boxes, the above content placement does coincide with the 
proportional-to-product strategy. 

Lemma 2: By using the above content placement strategy, 
at a certain box, if M ^ ^ (min^ ai)B, 

Pr(all the M cached contents are different) ~ 1. (24) 

o 

Proof: In the birthday paradox, if there are m people 
and n equally possible birthdays, the probability that all the 
m people have different birthdays is close to 1 whenever 
m ^ ^/n- Here in our problem, at a certain box, the M 
cache slots are regarded as "people" and the \C\ contents are 
regarded as "birthdays." Although the probability of picking 
one content is non-uniform, the probability of picking one 
content within a specific class is uniform. One can think of 
picking a content for a cache slot as a two-step process: With 
probability aii/j/ aj^j, a content in class i is chosen. Then 
conditioned on class i, a specific content is chosen uniformly 
at random among all the a^S contents in class i. 

Contents from different classes are obviously different. 
When M ^ ^/oiB, even if all the M cached contents are 
from class i, the probability that they are different is close to 
1. Thus, M <C \/minj aiB is sufficient for (|24] | to hold. ■ 

To prove that under this particular placement, inefficiency 
in bandwidth utiUzation vanishes as M oo, we shall in 
fact consider a slight modification of the "request repacking" 
strategy considered so far for determining which contents to 
accept: 



Counter-Based Acceptance Rule 



A parameter L > is fixed. Each box b maintains at all 
times a counter Zi, of associated requests. For any content 
c, the following procedure is used by the server whenever a 
request arrives: A random set of L distinct boxes, each of 
which holds a replica of content c, is selected. An attempt is 
made to associate the newly arrived request with all L boxes, 
but the request will be rejected if its acceptance would lead 
any of the corresponding box counters to exceed LU. 
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Remark 4: Note that in this acceptance rule, associating a 
request to a set of L boxes does not mean that the requested 
content will be downloaded from all these L boxes. In fact, 
as before, the download stream will only come from one of 
the L boxes, but here we do not specify which one is to be 
picked. 

It is readily seen that the above rule defines a loss network. 
Moreover, it is a stricter acceptance rule than the previously 
considered one. Indeed, it can be verified that when all ongoing 
requests have an associated set of L boxes, whose counters 
are no larger than LU, there exist nonnegative integers Zcb 
such that Yjb-.cdJi ^cf- = Ln^, V c G C and Y^c-.c^j^ ^cb < 
LU, y b G B, then feasibiUty condition (|2]i holds a fortiori, o 



We introduce an additional assumption, needed for technical 
reasons. 

Assumption 1: A content which is too poorly replicated is 
never served. Specifically, a content must be replicated at 
least M^/"* times to be eligible for service. o 

Our main result in this context is the following theorem: 
Theorem 3: Consider fixed AI, a,, I'i, and corresponding 
load p < 1. Then for suitable choice of parameter L, with 
high probability (with respect to placement) as _B — > oo, the 
loss network with the above "modified proportional-to-product 
placement" and "counter-based acceptance rule" admits a 
content rejection probability 0(A/) for some function 0(M) 
decreasing to zero as M oo. o 

The interpretation of this theorem is as follows: The frac- 
tion of lost service opportunities, for an underloaded system 
ip < 1), vanishes as M increases. Thus, while Corollary |2] 
showed that A/ oo is necessary for optimal performance, 
this theorem shows that it is also sufficient: there is no need 
for a minimal speed (e.g. M > log B) to ensure that the loss 
rate becomes negligible. 

The proof is given in Appendix [A] 

VII. Conclusion 

In peer-to-peer video-on-demand systems, the information 
of content popularity can be utilized to design optimal content 
placement strategies, which minimizes the fraction of rejected 
requests in the system, or equivalently, maximizes the uti- 
lization of peers' uplink bandwidth resources. We focused 
on P2P systems where the number of users is large. For 
the limited content catalogue size scenario, we proved the 
optimality of a proportional-to-product placement in the Dis- 
tributed Server Network architecture, and proved optimality 
of "Hot- Warm-Cold" placement in the Pure P2P Network 
architecture. For the large content catalogue scenario, we also 
established that proportional-to-product placement leads to 
optimal performance in the Distributed Server Network. Many 
interesting questions remain. To name only two, more general 
popularity distributions (e.g. Zipf) for the large catalogue 
scenario could be investigated; the efficiency of adaptive cache 
update rules such as the one discussed in Section IIV-AI or 



classical alternatives such as LRU, in conjunction with a loss 
network operation, also deserves more detailed analysis. 
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Appendix 

A. Proof of Theorem |i] 

The proof has five sequential stages: 

1) The chance for a content to be "good" 

Let Nc denote the number of replicas of content c of class 
i. Then, Nc admits a binomial distribution with parameters 
{MB, We call content c a "good" content if \Nc - 

E[7Vc]| < i.e., 

iV.-^ <M 



pU 



2/3 



(25) 
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As Nc = J2i=i ^i' where ^ Ber{p) (p ^ are i.i.d., 

according to the Chernoff bound. 



Pr 



< ^-MB-I{a) 



(26) 



where a = (^M^/s + /MB and I{x) = snpglxO - 

ln(E[ef^'])} is the Cramer transform of the Bernoulli random 
variable Zi. Instead of directly deriving the RHS of inequality 
( |26] l. which can be done but needs a lot of calculations (see 
Appendix |G]l, we upper bound it by using a much simpler 
approach here: For the same deviation, a classical upper bound 
on the Chernoff bound of a binomial random variable is 
provided by the Chernoff bound of a Poisson random variable 
which has the same mean (see e.g. [5]). Therefore, the RHS 
of inequality (l26l ) can be upper bounded by 

u,M J pU 



exp 



pU 



I I 



where I{x) is the Cramer transform of a unit mean Poisson 
random variable, i.e., I{x) = xlogx — a; + 1. By Taylor's 
expansion of I{x) at x = 1, the exponent in the last expression 
is equivalent to 



'm'/^] =-ef a/1/3 



On the other hand, when AI is large, -M^/^ + ^ > 
holds, hence we have 



Pr ( Nc < -Af + 



M/ 
pU 



/MB 



Pr 5Z ^' - ■ " - ^ 



MB-I{a) 



(27) 



\i=l 



where - Ber{p), a = AJ-'^/^/B-p e [-1,0] when B 

is large, and it is easy to check that /(a) = I{—a). Similarly 
as above by upper bounding e^^^^'^'^~°'\ we can find that the 
exponent of the upper bound is also —0 (A/^/^). Therefore, 



Pr(content c is good) > 1 - 2e"®(^^'^'). 
2) The number of "good contents" in each class 



(28) 



Denoting by Xj the number of good contents in class i, we 
want to use a corollary of Azuma-Hoeffding inequality (see 
e.g. Section 12.5.1 in fTOl or Corollary 6.4 in ||5l) to upper 
bound the chance of its deviation from its mean. This corollary 
applies to a function / of independent variables , . . . , and 
states that if the function changes by an amount no more than 
some constant c when only one component has its value 
changed, then for all t > 0, 

Pr(|/(O-]E[,m)]|>0<2e-2*'/("^'). 

Back to our problem, each independent variable corre- 
spond to the choice of a content to be placed in a particular 



memory slot at a particular box (we index a slot by j for 
1 < J < AJB), and /(^) corresponds to the number of good 
contents in class i based on the placement ^, i.e., = /(^). 
It is easy to see that in our case c = 1, hence we have 

Pr(|X, -E[X,]| >t)< 2e-2tV(Affl)^ yt > 0. 
Taking t = (AfB)^/^ in the above inequality further yields 

Pr ( \X, - E[X,]\ > (A/B)2/3) < 2e-2(^ffl)'^^ 



Thus, we have 

Pr (X, > (l - 2e-^(^^''')) • a,B - {AIBf/^ 

> Pr (^X^ > E[X.,] - (A/B)2/3^ 

> Pr (^\X, - E[X,]\ < {AIBf/^^ 

> l-2e-2(^^^)''', 
where (a) holds since 

E[Xi] = Pr(content c is good) • aiB 

> fl-2e-e(*^''')V«,i3. 



(29) 



Note that in order for the lower bound on Xi shown in the 
above probability to be Q{B), AI ^ o{B^^^) is a sufficient 
condition. 

3) The chance for a box to be "good" 

We call a replica "good" if it is a replica of a good content, 
and use Ci to denote the number of good replicas of class i. 
We also call a box "good" if the number of good replicas of 
class i held by this box lies within 

^^±0(A//2/3). 
pU 

As we did for "good contents," we will also use the Chernoff 
bound to prove that a box is good with high probability. 

Let £i represent an event that the number Xi of good 
contents within class i satisfies 

X, > (l - 2e-«(^^''')) a,B - {MBf'\ (30) 

which has a probability of at least 1 — 2e^^^^^^^^ ' \ accord- 
ing to inequality ( |29] l when AI ^ o{B^/^). Conditional on 
£i, according to the lower bound in inequality ( |25] ) (i.e., the 
definition of "good contents") and inequality ( l30l ). we have 



a > 



- AI 



VjAI 
pU 

(A/B)2/3 



2/3 



1 - 2e-«(^^''') 



a,B 



MB- 



pU 



{l - 0{Ar^'^ + Af2/3B-i/3)^ . 



(31) 



On the other hand, from the upper bound in inequality 
and the fact Xi < atB, we obtain that 



Ci < MB 



(l + 0(Af-i/3)) 



(32) 
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Conditional on 8i, to constitute a box, sample without re- 
placement from the determined content replicas. Denote the 
number of good replicas of class i stored in a particular box 
(say, box h) by which actually represents the number of 
good replicas in the M samples sampled without replacement 
from all the MB replicas, among which Ci are good ones 
(conditional on 8i). This means that, conditional on C,i 
follows a hypergeometric distribution H{MB,Ci, M). It can 
be found that (see e.g. Theorem 1 in ID) conditional on £i, 
Hi <st Ci <st Gi- Here, "<st" represents stochastic ordering, 
and 



Bin I^M. 
Bin I A/, 



.^(l + 0,M-./3)) 



where the second parameters of the distributions of Gi and 
Hi are determined according to inequaUties (l32l i and (ISTl i 
respectively. 

We will see why we need these two "binomial bounds" on 
Ci. By definition, 

Pr(box b is not good) 

a^i^iM 



(33) 




Pr 
= Pr 

+ 1 
< Pr 



cttViM 



pU 
UiViM 



Pr 



pU 

UiViM 



pU 
UiViM 



pU 



£, ■ Pr {£, 



By definition of stochastic ordering 

UiViM 



(34) 



Pr 



< Pr G, > 



pU 
aiViM 



Pr H,, < 



pU 

aiUiM 



h 0(A/2/3) 



pU 



- 0{M 



2/3- 



where (a) can be obtained using a similar Chernoff bounding 
approach as for Nc in Stage 1 of this proof. Thus, continuing 



from inequality ( l34l i. we further have 

ctiMiM 



pU 



> 0{M^'^) 



< 2e-«(^^''').Pr(£,) + (l-Pr(fO) 
= l-(l-2e-«(^^'''))Pr(f,0 

1 - (1 - 2e-«(^^'''))(l - 2e-^^((*^s)''')) 
2p-e(A/i/-^) _ 2g-o((Affl)i/-^)^ 



< 



(35) 



Putting inequality ( [35] l back to inequality ( [33] ) immediately 
results in 



Pr(box b is good) > 1 - 2|I|e~®(^^ 
4) The number of "good boxes" 



l/3l 



(36) 



We use a similar approach as in Stage 2 to bound the 
number of good boxes, say Y, which can be represented as a 
function g(^) where C = (Ci > '?2, • • • , £.mb) is the same content 
placement vector defined in Stage 2. Still, g(^) changes by an 
amount no more than 1 when only one component has its 
value changed, then for all t > 0, Pr(|y - E[Y]\ > t) < 
2g-2tV(A/B)^ and taking t = {MB f''^ further yields 

Pr (|y - E[y]| > [MBf'^) < 2e-2(MB)^/^^ 



Similarly as we obtain inequality ( |29l l, we finally come to 

Pr > B (^1 - 2\I\ 



(37) 



5) The performance of a loss network 



Finally, consider the performance of the loss network 
defined by the "Counter-Based Acceptance Rule." We 
introduce an auxiliary system to establish an upper bound 
on the rejection rate. In the auxiliary system, upon arrival 
of a request for content c, L different requests are mapped 
to L distinct boxes holding a replica of c, but here they are 
accepted or rejected individually rather than jointly. Letting 
Zb (respectively, Z'^) denote the number of requests associated 
to box b in the original (respectively, auxiliary) system, one 
readily sees that Zj, < at all times and all boxes and for 
each box b, the process evolves as a one-dimensional loss 
network. We now want to upper bound the overall arrival rate 
of requests to a good box: 

(a) Non-good contents 

Assume that upon a request arrival, we indeed pick L 
content replicas, rather than L distinct boxes holding the 
requested content (as specified in the acceptance rule). This 
entails that, if two replicas of this content are present at one 
box, then this box can be picked twice. However, since a 
vanishing fraction of boxes will have more than one replicas 
of the same content when M <C \J (mini ai)B (as proved 
in Lemma |2]i, we can strengthen the definition of a "good" 
box to ensure that, on top of the previous properties, a good 
box should hold M distinct replicas. It is easy to see that the 



14 



fraction of good boxes will still be of the same order as with 
the original weaker definition. 

With these modified definitions, consider one non-good 
content c of class i cached at a good box. Its unique repUca 
will be picked with probability L/Nc when the sampling of 
L replicas among the Nc existing ones is performed. Thus, 
since we ignore requests for all content c with Nc < AI^^'^ 
(according to Assumption [U, the request rate will be at most 

Besides, there are at most 0(A/^/'^) non-good content 
replicas held by one good box. The reason is as follows: By 
definition, a good box holds at least 



E 



(38) 



good content replicas among all classes, so the remaining 
slots, being occupied by non-good content replicas, are at most 
0{Kp/^). Therefore, the overall arrival rate of requests for 
non-good contents to a good box is upper bounded by 



'^non-good 



= 0(m2/3 . ^ 0{LM-^'^^). (39) 



(b) Good contents 

The rate generated by a good content c of class i is ViL/Nc 
Now, by definition of a good content, one has: 



iV.>^(l-0(A/-i/3)). 
pU 



This entails that the rate of requests for this content is upper 
bounded by 

pLU I 



M 



■(l + 0(Af-i/3)). 



By definition of a "good box," there are at most OLiViMj pU + 
OiM"^!^) good content replicas of class i cached in this good 
box. Therefore, the overall arrival rate of requests for good 
contents to a good box is upper bounded by 

Vod - E(^(1 + 0(^^-^^^))) 



= ipLU)il + OiM-'/^)). 



(40) 



To conclude, for any good box b, the process evolves 
as a one-dimensional loss network with arrival rate no larger 
than 

^ = '^non-good + '^good = P^^ + 0{LM-^^^^), 
by combining the two results in ( |39] | and ( l40l i. 

Next, we are going to upper bound the loss probability 
of Z'l^. Since V is an upper bound on the arrival rate, the 
probability that = LU is upper bounded by E{pLU + 
0{LM-^/^^),LU). One can actually further upper bound this 
Erlang function by e"®*^^'. To see this, let us first rewrite 
the loss probability (Erlang function) of a general 1-D loss 



network, say E{X, C), as a certain conditional probability of 
5-Poi(A), i.e.. 



Pr(5 = C) 
Pr(5 < C)' 

Using the Chernoff bound, we have Pr(S' > C) < e^^'^'~^^^\ 



E{X,C) =Pt{S = C\S < C) ^ 

the Chernoff bound, we have I 
where I{x) = x log a; — a; + 1, hence 



1 - Pr(S' > C) - 1 - e-^-f(<^A) ' 

Back to the Erlang function in our problem, I{C/X) = I{{p + 
0(il/-i/i2))-i), hence, 

Pr(4 = LU) < E{pLU + 0{LM-^/^^), LU) < e^^^^^ 

(41) 

where the second inequality holds under the assumption that 
p < 1 (otherwise, the exponent will become or +Q{L)). 

The number of good replicas in good boxes is, due to 
inequality (O and equation at least MB{1~0{M-^^^)), 
with a high probability (at least 1 - 2e^'^'^^'^'>^'^). On the other 
hand, the total number of replicas of good contents is at most 
MB, which is the total number of replicas (or available cache 
slots). 

Now pick some small e e (0, 1/3) and let X denote the 
number of good contents which have at least Af 2/"^+^ replicas 
outside good boxes. Then necessarily, with a probability of at 
least 1 - 2e-2(^^^)'''\ 



< MB - MB{1 - OiM-'^/^)) = 0(BAf^/^), 



f2/3N 



i.e., X < 0{BM~^). According to inequaUty ([29ll, the total 
number of good contents is 0(i?) (specifically, very close to 
\C\ ^ aB) with a probability of at least 1 - 2\I\e-^^^^^^"\ 
hence we can conclude that, with high probability, for a 
fraction of at least 1 — 0{M~'^) of good contents, each of them 
has at least a fraction 1 — 0(A/f ^^/'^+^) of its replicas stored 
in good boxes (since a good content has jfjM ± 0{M'^/'^) 
replicas in total by definition). We further use C to represent 
the set of such contents. 

Recall that Ac was defined in Subsection |IV-B| as the steady- 
state probability of accepting a request for content c in the 
original system. For all c G C, 

Ac > Pr(all the L sampled replicas are in good boxes) 
X Pr(Zb < LU, \/b s.t. box b is sampled) 

(l-0(A/-i/3+^))'^ 

X Pr(Z^ < LU, Mb s.t. box 6 is sampled). 

(l-0(A/-i/3+^))''. (l-Le-^^(^)) . 

(42) 



(a) 
> 



(b) 
> 



Here, (b) is obtained according to inequality ( 1411 1. The argu- 
ment why (a) holds is as follows: We have Nc ~ ViM/{pU) 
replicas (assuming that content c is of class i), among which 
N'c = Nc{l - 0(A/-i/3+e-)) ^j.g in gQQ(j boxes. Then, the 
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probability that L samples fall in the good boxes can be written 
explicitly as 

N',{N', ~ 1) ■ ■ ■ (iV^ - L + 1) 

N,{N,-l)---{Nc-L + lY 

which can be approximated as the first part on the RHS we 
write above, under the assumption that L <C M. The second 
part is due to the fact that < Zb for all box h. 

It should be recalled that within this stage of proof, finally 
coming to inequality ( l42l i actually needs everything to be 
conditional on the following events: 

> The number of good boxes is Q{B)\ 

• The number of good contents is Q{B)\ 

• A box caches M distinct replicas, 

and as B, M — )■ oo and I\I <C a/ (min^ ai)B, all of them 
have high probabilities. Additionally, C A C as B, M — oo. 
Therefore, further letting L ^ oo but keeping L <C M^/^~^, 
we will find that the RHS of inequality (l42l l is approximated 
as 

1 - 0{LM-^/^+') - ie-®(^' w 1, 

and then conclude that the requests for almost all the contents 
will have near-zero loss. 

B. Proof of Equivalence between Feasibility Conditions (Q 
and (0 

1) Sufficiency of Condition f^: We use Hall's theorem to 
prove the sufficiency. 

[Hall's theorem] Suppose J' = { Ji, J2, • • • } is a collection of 
sets (not necessarily countable). A SDR ("System of Distinct 
Representatives") for J is defined as X = {.xi, X2, • • • }, 
where Xj e Jj. Then, there exists a SDR (not necessarily 
unique) iff. meets the following condition: 

yrcj, in < I u A\. (43) 

O 

In our P2P VoD system, denote the content set as C = 
{ci,C2,-- - ,C7v}- Given the ongoing download services of 
each content {n,}^]^, we get a "distinguishable content set" 

r - /.(I) J2) ... J"i). Ji) J2) J"2).... . 

(1) (2) ^ 

' N ' ' N J ' 

(k) 

where represents the fc-th download service of content i 
for 1 < ^ < iii, and has its "potential connection set" 

1 <.?<[/, c,eb, beB}, 

i.e., the set of all the connections of those boxes which have 
content Cj. A collection of the "potential connection sets" for 
all {cj*^''} is then 

7_rT(l) t(2) 7("i). . t(1) j{2} T('iiv)T. 

O — I'-'i I ''I I ■ ■ ■ 1 ''I I ' ■ ■ J N ' N ' ■ ■ ' ' N /' 

and a SDR for S is 



s.t. x\ e Jj , which means each c\ is affiliated with a 
distinct connection (i.e., a feasible solution in our model). 

Now we want to prove the existence of such a SDR, i.e., 
to prove equation ( |43] |. For V T C JJ, there is a one-to-one 
mapping between T and a 5 C C. Further, this S can be 
mapped to a 5 C C where 

5 = {c; : 31 < fc < Tii, s.t. cf ^ € 5}, 

i.e., S is the set of all contents considered in S without 
considering multiple services of each content. Then, V T ^ J^, 

RHS = I U Jf|= E u 

b-.^aeS s.t. aeb 

= U\{beB: SnJb^lD}] 

and 

LHS = in = |5| < ^ Hi. 

Therefore, if 

ys CC,J2nr<=U\{beB: SnJb^ 0}| 

holds, then equation ( l43T l holds. The sufficiency is proved. 
2) Necessity of Condition (|2]).- For any S CC, 

ce5 ceSb-.ceJb ''^ ^css cG5nj"6 

< ^ i7 = C/|{6ei3: 5n Jfc^0}|, 

fc: Bees s.t. ceJb 

where the inequality (a) is due to the second constraint in 
condition ([T]). Hence, the necessity is proved. 

C. Approximation to Proportional-to-Product Placement Us- 
ing Bernoulli Sampling 

An alternative sampling strategy to get the proportional-to- 
product placement is as follows: 



To push contents to box b [1 < b < B), the server will 

1. Generate C independent Bernoulli random variables 
Xc ~ Ber(pc) for all c G C, where pc = Pvc/ (1 + Pvc), 
Vc is the normalized version of Vc, and /3 is a customized 
constant parameter 

2. If X^csc "'^c ~ -^"^ (which means a valid cluster of size 
M is generated), push content c to box b if Xc — 1; 
Otherwise, go back to Step [T] 



We now analyze why this scheme works: after generating a 
valid size- A/ subset, the probability that this subset is a certain 
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cec 



subset Qj equals 

PiiXc - 1, Vc e g,; X, = 0, Vc ^ g,\ Y^Xc^ m) 

Pc f UcecPc 



Pc VPr(Ecec^c-M) 



-Hi 

ceQj 

= n ^-/^' 

where Z = PiiJ2,^c^c = ^4")/(/3^^ HcecPc), which actu- 
ally equals the normalizing factor for Hceg ^c- 

We then consider the computational complexity of this 
approximation algorithm. Assuming that {/>c} is sorted in the 
descending order, we have 

M C 
cGC c=1 c=M+1 



A p* 



So the computational complexity is upper bounded by 
0{BC / P*). Note that the constant parameter /3 can be 
adjusted to get a higher Pr(Ecec ~ ■^^) ™ order to reduce 
computational complexity. To achieve this, we can just choose 
a /3 which maximizes its lower bound P*, so 



M 



0. 



(44) 



The server can use any numerical methods (e.g., Newton's 
method) to seek a root of equation (|44] |. In fact, this lower 
bound P* on Pr(Ecec "'^c = A^) is not tight, since it is just 
the largest item in the sum expression. When the popularity 
is close to uniformness (e.g., in a zipf-like distribution, a 
is small), this largest item is no longer dominant, so the 
lower bound P* is quite untight, which means we actually 
overestimate the computation complexity by only evaluating 
its upper bound. However, this will not affect the real gain we 
obtain after choosing the optimal /3 according to equation (l44l i. 

Recall that we also proposed a simple sampling strategy in 
Section IIV-AI It is easy to see that when some contents are 
much more popular than the others (e.g., zipf-like a is large), 
the probability that duplicates appear in one size-A/ sample 
is high, hence largely increases the number of resampling. 
Thus, it would be faster if we choose the Bernoulli sampling. 
However, when the popularity is quite uniform, the simple 
sampling works very well. An extreme case is that under the 
uniform popularity distribution. 



Pr{a vaHd size- A/ subset} = 



M-l 



c 



M 



which shows that when C is large, you can get a valid sample 
almost every time. 



D. Detailed Implementation in the Simulations 

1) A Heuristic Repacking Algorithm: We first describe the 
concept of "repacking." When the cache size M = 1, all the 
bandwidth resources at a certain box belongs to the content 
the box caches. When M > 2, however, this is not the case: 
all the contents cached in one box are actually competitors for 
the bandwidth resources at that box. Let's consider a simple 
example in which B = 2, A/ = 2 and t/ = 1: Box 1 which 
caches content 1 and 2 is serving a download of content 2, 
while box 2 which caches content 2 and 3 is idle. When a 
request for content 1 comes, the only potential candidate to 
serve it is box 1, but since the only connection is already 
occupied by a download of content 2, the request for content 
1 has to be rejected. However, if this ongoing download can be 
"forwarded" to the idle box 2, the new request can be satisfied 
without breaking the old one. We call this type of forwarding 
"repacking." 

In the the feasibility condition ^ and its equivalent form 
(|2|i, we actually allow perfect repacking to identify a feasible 
{?T.c}. In a real system, perfect repacking needs to enumerate 
all the possible serving patterns and choose the best one based 
on some criterion, which is usually computationally infeasible. 
We then propose a heuristic repacking algorithm which is not 
so complex but can achieve similar functionality and improve 
performances, although imperfect. 

Several variables need to be defined before we describe the 
algorithm: 

• Tic- the system-wide ongoing downloads of content c, 
which does not count the downloads from the server 

• B^: The set of boxes which have content c ("potential 
candidate boxes") and k free connections, for < k < U. 

• Dc'. number of boxes which has content c. Dc = 

• Ufc: a {/-dimensional vector, of which the i-th component 
represents the content box b is using its i-th connection 
to upload (a value represents a free connection). 

• Co', the "orphan content" which is affiliated with a new 
request or an ongoing download but has not been assigned 
with any box. 

• Co- the set of contents which has once been chosen as 
orphan contents. 

• tf;: the number of repacking already done. 

Note that when choosing a box to serve a request, load bal- 
ancing is already considered, which to some extent reduces the 
chance of necessary repacking in later operations. However, 
repacking is still needed for an incoming request for content 
c as soon as Uk>oB^ = 0- 

Repacking Algorithm 



After getting a request for content c while UkyoB^ = 



the 



server 

1. Initialize Cq :~ c, Co := {c}, and tn := 0. 

2. Let C = {c' : Uc'/Dc' > ricjDc^ and c' ^ Co}, i.e., 
a set of contents which haven't become orphans during 
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this repacking process and of which the utilization factor 
(may be larger than 1) is larger than that of the current 
orphan content Cq. If Co — 0, regard Co as a loss and 
TERMINATE. 

3. Choose c* — aTgmax^,^(;{nc' / Dc'}- Uniformly pick 
one (box, connection) pair from 

{(&, i) : b £ Sj?, c* is the i-th component of Ub}. 

4. Use the chosen box b and its i-th connection to continue 
uploading the remaining part of content Cq. At the 
same time, c* which was served using that connection 
becomes a new orphan, i.e., Cq := c*. Update Uft and 
{uc}. Set tn + 1. 

5. If Uk>oB^ 7^ 0, i.e., there exists a free connection to 
serve the new Co, then use the load-balancing-based box 
selection rule to select a box to continue uploading the 
remaining part of Cq. The repacking process is perfect 
(no remaining orphan) and TERMINATE. Otherwise, 

« If tfl = t^"^, a customized algorithm parameter 
(0 < < C), regard Co as a loss and TERMI- 
NATE. 

< Otherwise, set Co Co + {co}, and go to Step|2] 



2) A Practical Issue in Cache Update: When a box b is 
chosen for cache update (and it does not hold the content 
c corresponding to the request), it might still be uploading 
content c' which is to be replaced. This fact is not captured by 
the Markov chain model. In practice, those ongoing services 
must be terminated. Since we have introduced the repacking 
scheme, they become "orphans" ready for repacking. We 
implement the procedure as follows: 

1. Rank these orphans by their remaining service time in 
the ascending order, i.e., the original download which is 
sooner to be completed is given higher priority. 

2. Do repacking one by one until one orphan fails to be 
repacked. Note that here the repacking algorithm starts 
from Step [S] since there may already be some boxes 
with both content c and free connections. 



E. Proof of Theorem |2] 

The Lagrangian of OPT 2 is 



L(m, A,x; u, v, y, z, w, ry, 7) 




The KKT condition includes the feasible set defined in OPT 
2 and the following: 



dL 
drhc 





T 

oL 

dxc 




= 0, 


Vc; 


= Pc 


- Uc + 


Vc - PcZc - 7 


= 0, 


Vc; 


dL 


= —Vc 


+ 2/c — + 


— n 

— "7 


vc, 






Uc{lflc - 1) 


= 0, 


Uc > 0, Vc; 






Vc{\c - rhc) 


= 0, 


Vc > 0, Vc; 






ydxc - Ac) 


= 0, 


Vc > 0, Vc; 






- Pc + Pc-fric) 


= 0, 


Zc > 0, Vc; 






WcXc 


= 0, 


Wc > 0, Vc 



We then put the solution stated in the theorem into KKT 
condition to check whether the condition is satisfied. The 
analysis is as follows: 

> For 1 < c < 7\/ — 1, since file = 1 and Ac = Xc = 0, we 
obtain that Vc = 0, yc + Zc = 1, Pc(l — Zc) = Uc + 7, 
and yc = t] — Wc- Letting Wc = 0, we further have: 
Uc = pcV-1: Vc = Zc = To keep itc, Zc > 0, 
we must have 77 G [0, 1] and 7 < pcf], for 1 < c < M —1. 
Thus, since {pc} are also ranked in the descending order, 
we have 

7 < Pm-iV- (45) 

• For M < c < c*, since file = Xc = Xc = Pc/{^ + Pc), 
we obtain that Uc = Wc — 0, yc + Zc ^ 1, Pc(l — Zc) — 
•y — Vc, yc ~ rj + Vc- We further have: 

1- PcV V + 1 1 V + 7 

Vc — , yc — , Zc — I . 

Pc + 1 Pc + 1 Pc + 1 

To keep Vc,yc,Zc > 0, we must have pc?7 < 7 < Pc + 
1-77, for M < c < c*. Thus, 

PmV < 1 < Pc* + I - V- (46) 

« For c = c* + 1, when lUc = 0, it degenerates to the next 
case. When ffic > 0, since fhc = Ac = Xc < Pc(l — i^c), 
we obtain that Uc ~ Wc — Zc ^ 0, yc = I, Pc + = 
7, 77 + Wc = 1. We further have 

7 = Pc*+i + 1-77. (47) 

• For c* + 2 < c < C, since file ^ Xc ~ Xc = 0, we obtain 
that Uc = Zc = 0, j/c = 1, Vc = 7-Pc, Wc = ri+Vc-1 = 
77 + 7 — Pc — 1. To keep Vc, Wc > 0, and due to the fact 
that 77 g [0, 1], we must have j > Pc, for c* + 2 < c < C. 
Thus, 

7 > PC +2. (48) 

For inequalities ( l45b . ( l46b , ( |48] | and equation ( |47] | to hold 
simultaneously, we can choose a 77 which satisfies 

Pc* + 1 + 1 , Pc* + 1 + 1 

< 77 < , 

Pm-1 +1 + 1 

which also satisfies 77 e [0,1]. Therefore, the theorem is 
proved. 
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It should be mentioned that when 'YTc=m "^c ^ ^' i-^-' 
rhc'+i = 0, the case "c = c* + 1" can be combined with 
the next case "c* + 2 < c < C", hence equation (|47] | does not 
exist while inequality ( l48T l is changed to 7 > /Oc*+i- Then, we 
can just choose a r/ which satisfies 

o<^<^£:±i±I. 

Storage of Segments and Parallel Substreaming 

We have mentioned before that compared to the "storage 
of complete contents and downloads by single streaming" 
setting, a more widely used mechanism in practice is that 
each box stores one specific segment of a video content 
and a download (streaming) comprises parallel substreaming 
from different boxes. To model this mechanism, we have the 
following simplifying assumptions: Each content is divided 
into K segments with equal length which are independently 
stored. Each box can store up to M segments (actually it 
does not matter if we keep the original storage space of each 
box, i.e., AI complete contents, which now can hold MK 
segments, since the storage space is a customized parameter) 
and these M segments do not necessarily belong to M distinct 
contents. The bandwidth of each box is kept as U, so now 
each box can accommodate UK parallel substreaming, each 
with download rate 1/K (the average service duration is still 
kept as 1 because each segment is 1/K of the original content 
length). The definition of "traffic load" p is then the same as in 
equation (|6]l. A request for a content wiU be divided into sub- 
requests submitted to the boxes holding those corresponding 
segments of this content, generating K parallel substreaming 
flows in total (one box can serve more than one substreaming 
service for this request if it caches more than one distinct 
segments of this content). 

Let 9 represent a segment and 9 £ c indicate that is a 
segment of content c. Recall that we use ric to denote the 
number of concurrent downloads (now called "streams") of 
content c in the network. We further use ng to denote the 
number of substreams corresponding to segment 9. 

Now the original feasibility constraint (HJ becomes 



E 



Zeb 



ne,y 9e 6; 



zeb < UK, y beB, 



(49) 



9: 0eJb 

where Q represents the whole set of segments and zgi, denotes 
the the number of concurrent substreams downloading segment 
9 from box b. It is easy to see that the equivalent version which 
can be proved by Hall's theorem becomes: 



V 5 C e, '^ne < KU \{b e B : S n Jb ^ \ 



(50) 



9es 



where with a little abuse of notation, S is used to denote a 
subset of 6, instead of C as before. 

Since we have assumed that video duration and video 
streaming rate are all the same, one naturally has ng = Uc for 
all 6* S c. If we let randomness exist in the service duration. 



then within one stream, some substreams may complete earlier 
than the others. Therefore, the above equality needs to be 
added as a constraint (and used to come up with the following 
result), i.e., the bandwidth for the K substreams should be 
reserved until the whole streaming is completed. 

Then, in the proof of the optimality of "proportional-to- 
product" placement for DSN, every expression keeps the same, 
except that the feasibility constraint (fTOl i is changed to 

V5C e, ^ ^ x*"" < mjBUK, (51) 

eeSc-.eec j:jn<s#0 

and the "proportional-to-product" placement {mj} is now 
with respect to each segment, i.e., mj = Wg^jVg/Z for 
all j C s.t. |j| = M, where Z is the normalizing 
constant and vg ^ Vc Q ^ c. With an observation that 
X^ege '^s ^ ^ Scec = K, we can still come to an 
inequality same with inequality (fTTI i. except that c and C are 
replaced by 9 and 6 respectively. All the succeeding steps are 
exactly the same in the proof of optimality. 

G. Another Approach to Bound the Chance of "Good Con- 
tents " in Proving Theorem \3\ 

At the first stage of proving Theorem |3] we mentioned that 
we can also directly derive the Chernoff bound on the RHS of 
inequality ( |26] | to get the result. The derivation is given below: 

Recafl that I{x) = sup0{a;6' - ln(E[e^^'])} is the Cramer 
transform of the Bernoulli random variable Zi. It is easy to 
check that 



I{x) 



if X e [0, 1] 
else 



Also recall that a = [Ap/^ + ^) /MB = M-^/^/B 



where p = -j^jj- Since we are considering a large B, a £ [0, 1] 
holds. Thus, denoting p = 1 — p for brevity, the exponent of 
RHS of inequality ( l26b reads 



-AIB ■ I (a) 

-{pAIB + Ap/^) - In (^1 + 
- (pAiB- A/2/3). In (1- 



1 



pAI^/^B 
1 



pAI^/^B 



pAIB 



pMB + Af2/3 

pM^/^B ^ 2(pMi/3s)2 
pMB - Af2/3 pAIB 

pM^I^B ^ 2(pA/i/3s)2 



+ o(A/i/3) 




(52) 



With similar steps as above, we can show the exponent 
exponent of the RHS of inequality dZTl l is also —8 (A/^/"^). 
Therefore, inequality (l28l l is proved. 



